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CHAPTER  1 


INTRODUCTION 

Because  of  greater  reliability  demands  placed  upon  the  modem  digital  systems,  these 
systems  need  to  be  designed  with  fault-tolerant  capability.  Concurrent  error  detection 
(CED)  can  provide  this  capability  by  detecting  errors  caused  by  faults  in  the  system  dutii^ 
normal  operation  of  the  system.  Also  with  CED,  an  error  can  be  detected  soon  after  it  is 
produced,  resulting  in  shorter  error  latency  and  easier  error  recovery.  One  application  of 
CED  is  on  a  microprogram  control  unit  (MCUX 

Much  research  has  been  done  in  the  area  of  CED,  including  coding  self-checking 
circuits  [Wake78]  and  time  redtmdancy  CPaFu82].  However,  the  CED  concept  is  mainly 
applied  to  various  codes,  data  transmission,  and  simple  functional  units,  such  as  arithmetic 
units.  Little  work  has  been  done  in  the  control  unit  area.  Previous  work  is  primarily  in 
the  use  of  classical  self-checking  circuits,  using  bit  slicing,  parity,  and  m-out-of-n  codes  in 
simple  control  units  to  detect  a  limited  class  of  faults  [CSST73L  [DiSo7Sl  [Maki78L 
[Will77l.  These  techniques  are  neither  applicable  to  a  complex  control  unit,  nice  the 
AM2910,  nor  to  the  VLSI  technology. 

The  only  proposals  applicable  to  the  above  two  constraints  have  been  self-checking 
MOS-LSI  circuits  using  coding  [CrLa80]  and  duplication  [Wake78i  [SeLiSO].  In  [CrLa80L 
the  self-checking  technique  is  applied  to  a  microprocessor;  however,  the  design  is  not  an 
actual  chip  design.  Comparisons  are  done  in  terms  of  number  of  transistois  and  not  in 
terms  of  actual  chip  area.  The  duplication  technique  requires  not  cmly  duplicated  control 
units  but  also  input  and  output  checkers  and  an  output  check  bit  generator.  The  area 
redundancy  of  the  duplication  technique  will  be  compared  in  Chapter  6  to  the  design  intro¬ 
duced  in  this  thesis. 


Recent  research  in  the  control  unit  area  has  proposed  methods  using  a  parallel  signa¬ 
ture  analyzer  [Namj82l  [DuMa83L  a  check  symbol  stored  in  the  control  memory  [IyKi82]> 
or  a  separate  watchdog  monitor  [SrTh82li  The  signature  error  detection  scheme  is  based  on 
percentage  (tf  error  detection  but  not  on  any  fault  model,  and  the  scheme  does  not  detect 
incorrect  branches.  The  check  symbol  scheme  does  not  detect  all  illegal  and  incorrect 
branches  and  does  not  have  a  comprehensive  bit  error  detection.  The  performance  of  the 
watchdog  monitor  scheme  is  because  it  depends  on  the  complexity  of  the  monitor. 

All  of  the  above  proposals  in  the  CED  area  are  not  based  on  actual  chip  layout.  There 
are  only  two  proposals  based  on  actual  chip  layout:  the  Cfast  chip  [TWMTS82]  and  the 
MCU  chip  [WFAD831.  The  Cfast  chip  is  a  ^le  chip  fault-tolerant  microprocessor.  The 
Cfast  chip  simple  PLAs  with  parity  checking  as  its  controller.  There  is  no  protection 
for  portions  of  the  chip,  such  as  the  control  bus  and  the  ALU.  Also,  the  retry  PLA  is  not 
implemented  on  the  chip.  The  MCU  chip  is  a  microsequencer,  based  on  the  AM2910,  with 
CED.  This  thesis  is  on  the  redesign  and  layout  of  the  MCU  chip. 

Chapter  2  gives  a  functional  description  of  AM2910  upon  which  our  design  is  based. 
Some  modifications  have  been  made  for  CED  and  technology  considerations,  and  these 
modifications  are  discussed.  The  resultant  modified  instruction  set  is  also  given. 

Chapter  3  develops  a  fault  model  for  the  MCU.  Instead  of  considering  every  possible 
physical  fault  (m  the  MCU,  the  functional  level  fault  model  developed  in  [BaAb82]  is  used. 
Six  potential  areas  for  error  are  discussed. 

In  Chapter  4,  modifications  made  on  Wong's  design  are  discussed.  All  modifications 
ate  classified  into  four  levels:  system,  layout,  performance,  and  area.  At  the  system  leveL 
changes  are  to  improve  the  CED  fault  coverage.  Some  modifications  are  made  at  the 
layout  level  due  to  process  changes:  At  the  performance  leveL  the  main  emphases  are  to 
minimi?*  delay  time  and  to  decrease  the  clock  cycle.  Finally,  at  the  area  leveL  redundancy 
is  kept  to  a  minimum. 


Chapter  5  begins  with  an  overview  of  the  CED  design  approach  and  is  cmititiuijd 
with  a  detailed  CH>  design  on  the  MCU.  Individual  functional  modules  and  checkers  are 
discussed. 

Chapter  6  is  devoted  to  evaluation  of  the  chip  design  in  terms  of  area  redundancy  and 
timing  performance.  For  rimitig  evaluation,  TSIM,  a  MOS  timing  simulator,  is  used  on  all 
modules  Based  on  TSIM  results,  critical  paths  are  found  for  the  MCU.  Redundancy  and 
performance  of  the  MCU  are  compared  to  the  Wong’s  design  awd  *ian  to  the  duplieation 
approach. 

Chapter  7  provides  conclusions  and  suggestions  for  further  research.  Finally,  the 
appendix  contains  figures  for  various  cell  des^  in  mixed  notation. 
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CHAPTER  2 


THE  MICROPROGRAM  SEQUENCER 


t-. 


2.1.  TlieAM2910 

The  AM2910  Microprogram  controller  is  a  I2>bit  bipolar  address  sequencer  for  up  to 
4K  words  of  microprogram,  as  shown  in  Hgure  2-1.  During  each  microinstruction,  the 
multiplexer  selects  an  address  (Y)  from  one  of  four  sources:  register/counter  (R/C), 
microprogram  counter  (UPC),  stack  or  direct  external  input  (X).  The  instruction  pro¬ 
grammable  logic  array  (PLA)  decodes  4-bit  instruction  input  (I)  into  internal  control  sig¬ 
nals.  The  output  of  the  PLA  is  affected  by  the  condition  code  (CC)  and  zero-detection  (R>0) 
signal  from  the  R/C 


2.2.  Modifications 

Several  modifications  have  been  made  to  account  for  nMOS  technology  and  CED  con¬ 
sideration,  as  shown  in  Figure  2-2.  A  two-phase  clock  (PHIl  and  PH12)  is  used.  Instruction 
execution  and  error  checking  are  pipelined.  During  PHIl,  the  instruction  is  decoded,  then 
during  PHI2,  the  output  address  Y  is  generated.  EHuing  the  next  clock  cycle,  the  next 
instruction  is  decoded  in  PHIl,  and  the  status  signals  of  the  previous  instruction  are  gen¬ 
erated  in  PHI2.  Detailed  timing  operations  are  discussed  in  Section  6.1. 

Several  simplifications  have  also  been  made.  Condition  code  enable  CCEN  has  been 
omitted.  The  three  enable  signals  (PL,  MAP,  and  VECT)  are  not  in  their  complemented 
value  as  in  the  AM2910.  The  register  load  signal  RLD  is  also  omitted;  therefore,  R/C  can  be 
loaded  only  by  instructions.  The  UPC  is  incremented  at  every  cycle,  thus  elixninating  the 


X  CP 


Figure  2-1.  AM2910  Blocit  Diagram. 


*  lia  Jr 


carry-in  (Q)  input.  The  omission  of  Cl  does  not  allow  the  MCU  to  operate  as  a  slice  of  a 
multichip  MCU,  as  the  case  of  the  AM2910.  The  Y  output  is  always  enabled  so  that  output 
enable  OE  is  eliminated.  The  stack  FULL  dgnal  is  omitted. 

2J.  The  Instruction  Set 

The  instruction  set  after  the  above  modifications  is  shown  in  Table  2-1.  The  instruc¬ 
tion  set  is  very  «wiiiar  to  the  A2^910  instruction  set  [MiBrSO].  The  major  change  is  the 
elimination  of  CCEN  For  the  JUMP  2XRO  or  RESET  instruction,  the  address  Y  is  set  to  0 
by  setting  all  outputs  of  the  UPC  to  0. 
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Table  2-1.  The  Instruction  Set. 


m 

NAME 

R/C 

CON¬ 

TENTS 

FAIL  CC-LOW 

PASS  CC-HIGH 

m 

15-10 

MEIB* 

MONIC 

Y 

STACK 

Y 

STACK 

R/C 

JZ 

JUMP  ZERO 

X* 

UPC 

HOLD 

Hcai 

HOLD 

HOLD 

PL 

1 

CIS 

com  JSB 

PL 

X 

UPC 

HOLD 

EXT 

PUSH 

HOLD 

PL 

2 

IMAP 

JUMP  MAP 

X 

EXT 

HOLD 

EXT 

HOLD 

HOLD 

MAP 

3 

CJP 

CONDJUMP 

PL 

X  ■ 

UPC 

HOLD 

EXT 

HOLD 

HOLD 

PL 

D 

PUSH 

PUSH/COND 

LDCNTR 

X 

UPC 

PUSH 

UPC 

PUSH 

m 

PL 

5 

JSBP 

COND  JSB 
K/PL 

X 

REG 

PUSH 

EXT 

PUSH 

HOLD 

PL 

D 

CIV 

CONDJUMP 

VECTOR 

X 

UPC 

HOLD 

EXT 

HOLD 

HOLD 

VECT 

D 

JSP 

CONDJUMP 

R/PL 

’  X 

REG 

HOLD 

BCT 

HOLD 

HOLD 

PL 

REPEAT 

LOOP. 

CNTR  *  0 

^  0 

STACK 

HOLD 

STACK 

HOLD 

DEC 

PL 

o 

1 

so 

UPC 

POP 

UPC 

POP 

HOLD 

PL 

REPEAT  PL. 

^  0 

EXT 

HOLD 

EXT 

DEC 

PL 

Ai^i 

CNTR  so  0 

=  0 

UPC 

HOLD 

UPC 

HOLD 

HOLD 

PL 

m 

CRTN 

COND 

RETURN 

X 

UPC 

HOLD 

STACK 

POP 

HOLD 

PL 

B 

CJPP 

COND  JUMP 
PL  &  POP 

X 

UPC 

HOLD 

EXT 

POP 

HOLD 

PL 

m 

LDCT 

LDCNTRSe 

CONTINUE 

X 

UPC 

HOLD 

UPC 

HOLD 

LOAD 

PL 

D 

LOOP 

TEST  END 
LOOP 

X 

STACK 

HOLD 

UPC 

POP 

HOLD 

PL 

E 

COOT 

CONTINUE 

X 

UPC 

HOLD 

UPC 

HOLD 

HOLD 

PL 

THREE 

^  0 

STACK 

HOLD 

UPC 

POP 

DEC 

PL 

1  w  o 

WAX 

BRANCH 

=  0 

. ^ 

POP 

UPC 

POP 

HOLD 

PL 

•  X  -  Don't  cate. 


If  fail,  HOLD,  else  LOAD. 


CHAPTER  3 


FAULT  MODEL 


3.1.  Fonctioiial  Fault  Model 

Before  desigml^[  CED  capability  onto  the  MCU,  a  set  of  faults  must  be  predefined  so 
that  CED  will  detect  errors  caused  by  these  faults.  When  the  chip  is  as  complex  as  the 
MCU.  the  classical  stuck-at  fault  is  insufficient  to  describe  all  pnamhie  faults  on  the 
chip 

Instead  of  defining  faults  on  single  lines,  faults  can  be  classified  at  the  functional 
level  [BaAb82l.  A  module  can  be  divided  into  functional  blocks:  PLA.  decrementer,  incre- 

e 

menter,  register,  etc.  Each  block  is  described  by  the  functional  effects  of  the  physical  faults 
on  the  function  of  the  block.  Based  on  the  functional  fault  model  approach,  a  fault  moduli 
is  developed  for  the  MCU. 

3.2.  Fault  Model  for  the  MCU 

The  MCU  has  six  potential  areas  for  error 

(1)  Input  controls  signals  (I,  CCX 

(2)  External  inputs  (X). 

(3)  Control  decoding  and  transferring. 

(4)  Modules  (decrementer,  incrementer,  and  stack). 

(5)  Address  Bus. 


(6)  Power. 


The  fim  two  areas  include  errors  occurring  during  signal  transmission.  The  third 
.area  includes  errors  in  the  instruction  PLA  and  the  HA  control  bus.  A  single  physical 
faUure  in  PLA  will  cause  unidirectional  errors  at  the  ou^t  [BaAb82l  Faults  in  the  con> 
trol  bus  can  cause  nuaselection:  selecting  the  wrong  source,  selecting  two  sources,  or  no 
selection.  Selection  of  two  sources  will  result  in  unidirectional  errors  chat  can  be  detected 
on  the  address  bos.  When  no  source  is  selected,  all  Is  will  appear  on  the  address  bus.  The 
fourth  area  includes  not  only  errors  in  the  R/C  UPC.  and  stack  but  also  errors  in  the 
fanout  lines  of  the  HA  control  s^nals.  Because  errors  resulting  from  faults  on  the  R/C 
and  UPC  are  not  clear,  random  errors  are  assumed.  The  fifth  area  covers  all  bus  errors. 
Bridging  faults  or  broken  bit  bus  lines  cause  unidirectional  error  in  nMOS  technology.  The 
final  area  is  on  power  failure  in  the  major  fanout  of  power  and  ground  lines,  which  will 
cause  those  nodes  to  bo  floating. 


CHAPTER  4 


CHANGES  FROM  WONG’S  DESIGN 


This  MCU  design  has  many  changes  from  Wong’s  design  [WFAD83].  Detailed  infor- 
mation  on  Wang’s  des^  is  available  in  [Wong82l  All  the  changes  can  be  classified  into 
four  levels:  system,  area,  performance,  and  layout. 

At  the  system  level,  changes  are  made  to  simplify  the  design  without  diminishing  the 
CED  capability.  First,  the  address  checker  has  been  which  is  made  posable  by 

checking  the  output  of  the  MCU  aloi^  with  the  output  of  the  microstore  using  a  CED 
scheme  proposed  in  [FuAb84l.  The  same  scheme  is  used  for  the  PLA  and  PLA  control 
checker;  similarly,  the  PLA  input  checker  is  eliminated.  To  improve  the  fault  covers  of 
the  MCU.  both  the  UPC  and  its  check-bit  generator  are  duplicated,  and  a  checker  is  added 
for  checking  R/C  against  its  check  bits  when  loaded  with  external  inputs. 

At  the  layout  leveL  three  changes  are  made.  The  first  is  the  change  from  the  Texas 
Instruments  design  rules  to  Mead  and  Conway  design  rules  [MeCoSO],  Because  of  processing 
requirements,  buried  contact  is  used  inatMd  of  butting  contact,  and  the  value  of 
width  is  changed  from  2.S  microna  to  2  microns. 

At  the  area  leveL  the  effort  is  to  minimize  area  redundancy.  A  check-bit  generator  is 
shared  by  both  the  R/C  load  checker  and  the  PLA  control  checker.  Two-rail  totally  self- 
checking  checkers  are  replaced  by  TSC  checkers,  proposed  by  [JhAbSdL  because  the  latter 
requires  less  area  than  the  former.  The  »iimin«rinn  of  the  address  checker,  input  checker, 
and  register  ugs  at  the  S3^m  leveL  as  mentioned  before,  also  result  in  reduction  of  area 
redundancy. 
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At  the  performance  level,  the  overall  cycle  time  is  reduced  by  pipelining  the  instruc* 
tion  eiecution  and  checkii^.  Also,  many  of  the  basic  cells,  such  as  adders  and  subtractoxs, 
are  redesigned  to  have  shorter  delay  time  by  using  a  pasp  transistor  networks  [Whit83l 


CHAPTER  5 


THE  DESIGN  OF  THE  CED  MCU 

5.1.  An  Orerriew  of  the  CED 

All  infonnation  is  encoded  with  a  Berger  code,  which  is  the  binary  count  of  the 
number  of  zeros  in  the  information.  The  Berger  code  is  selected  because  it  is  a  systematic 
code,  where  the  infonnation  bits  are  separated  from  the  code  bits  and  because  the  code  can 
detect  all  unidirectional  errors  in  a  code  word. 

All  input  s^nals  are  checked  within  the  chip.  Instruction  signals  (I)  and  external 
input  signals  (X)  are  encoded  with  Berger  code,  as  shown  in  Hgure  2*2.  Both  CC  and  CC 
are  input  for  two-rail  checking. 

The  output  address  is  encoded  for  off-chip  checking.  Three  enable  signals,  pipeline 
address  enable  (PL),  map  address  enable  (MAP),  and  vector  address  enable  (VHCT),  are  out¬ 
put  from  the  MCU.  These  enable  signals  select  the  source  for  direa  input  source.  Since 
only  one  of  the  three  s^nals  is  HIGH  at  any  time,  the  three  enable  signals  form  a  1-out-of- 
3  code  for  off-chip  checking.  The  two  clock  signals  ate  output  from  the  chip  to  detect  any 
error  in  the  clock  signals. 

A  strongly  fault  secured  and  strongly  code  disjoint  PLA  is  used  [FuAbSd],  A  modified 
Berger  code  is  used  over  both  the  outputs  and  the  inputs  (I).  The  register/counter  and  UPC 
are  duplicated  to  detect  random  errors.  The  stack  is  a  strongly  fault  secure  shift  stack.  The 
strongly  fault  secure  multiplexer  takes  on  a  bus  structure.  As  mentioned  in  Chapter  4,  the 
checking  of  the  address  bus  has  been  moved  off<hip. 

Two  totally  self-checking  checkers  are  used.  The  first  one  is  the  R/C  load  checker. 
When  the  R/C  is  loaded  with  external  inputs,  its  register  content  is  checked  against  its 


Bezger  check  bits.  The  checking  is  necessaiy  to  insure  that  the  value,  if  used  for  counting, 
isconect. 

The  second  checker  is  the  PLA  control  checker.  This  checker  provides  error  detection 
in  the  following  areas:  input  control  s^nals,  PLA  decoding,  and  control  signal  transferrii^. 
It  also  provides  TSC  capability  to  the  stack  and  to  the  multiplexer  by  placing  it  at  the  end 
of  the  control  bus,  after  the  control  signals  have  passed  through  various  modules. 

The  power  and  clock  signals  take  on  bus  structures.  The  signals  come  into  the  chip 
frmn  one  end  and  routed  to  the  other  end  of  the  chip  through  bus  lines.  The  PLA  control 
checker  is  placed  at  the  end  of  the  power  bus  to  detection  power  failure.  The  two  clock 
phases  are  output  from  the  chip  at  the  end  of  the  clock  bus. 

1.1.  Functional  Descriptioii 

The  PLA  has  six  inputs:  4-bit  instruction  input  (IX  condition  code  (CC),  axid  register- 
zero-detection  (R-0).  The  zero-detection  is  an  internal  input.  The  PLA  generates  nine 
internal  control  signals,  two  of  which  ate  also  inverted  at  the  PLA  output.  Besides  the  con¬ 
trol  signals,  the  PLA  also  produces  three  enable  signals:  PL,  MAP,  and  VBCT. 

The  PLA  is  encoded  in  a  modified  Berger  code  [MaAD82].  As  shown  in  Table  5-1,  the 
number  of  zeroe  in  both  input  instruction  (I)  and  12-bit  output  is  from  8  to  14.  The 
modified  Berger  code  requires  3  bits  to  encode  0  to  6  for  8  to  14  zeros.  Counting  the  3-bit 
code  word,  the  PLA  generates  a  total  of  17  outputs. 

The  R/C  is  used  either  as  a  register  to  hold  a  branch  address  or  as  a  loop  counter  by 
decrementing  the  content  of  the  register.  When  the  external  input  is  loaded  into  R/C  the 
information  is  checked  against  the  check  bits  by  the  R/C  load  checker.  Once  the  register 
has  been  decremented,  the  register  should  not  be  selected  as  the  source  of  the  multiplexer. 
During  PHI2,  R/C  1  generates  R-0  signal  for  the  PLA,  while  R/C  2  generates  RiK)  for  two- 
rail  checking. 


Table  S-1.  PLA  Input  and  Output  Pattena. 
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The  UPC  increments  the  current  address  at  each  clock  cycle  and  generates  the  check 
Ints  for  the  incremented  address.  When  the  RESET  instruction  Gnstruction  0)  is  executed, 
the  output  of  the  UPC  is  set  to  address  0  and  the  output  of  the  check-bit  generator  is  set  to 
the  corresponding  Berger  code.  The  UPC  and  its  chedc-bit  generator  are  both  duplicated. 
The  outputs  of  the  duplicated  modules  are  hardwired  AND  together  as  shown  is  Figure  5*1. 
If  any  one  of  the  copy  is  faulty,  unidirectional  errors  are  resulted  in  the  ANDed  output, 
which  is  detectable  by  the  Berger  code. 

The  5-word  by  16-bit  last-in,  first-out  stack  provides  return  address  for  microsubrou- 
tines  or  loops.  The  stack  is  a  modified  shift  stack  in  [MeCoSOl.  The  stack  is  PUSHed  during 
PHIl  from  the  UPC  bus  and  the  check-bit  bus,  and  is  POPed  during  PHI2  unto  the  address 
bus.  Both  informatian  and  check  bits  are  stored  in  the  stack.  The  stack  is  made  to  be  TSC 
by  checking  the  control  signals  after  they  passed  through  the  stack. 

The  address  bus,  the  output  of  the  multiplexer,  is  precbarged  during  PHIl.  During 
PHI2,  one  of  the  four  possible  inputs  is  exubled  onto  the  address  bus.  The  multiplexer  is 
made  to  be  TSC  by  checkii^  the  enable  control  signals  after  they  pass  through  the  multi¬ 
plexer. 

The  totally  self-checking  checker  consists  of  a  check-bit  generator  and  a  totally  self¬ 
checking  equality  checker.  The  check-bit  generator  is  a  counter  using  full  adders  and  half 
adders  connected  in  a  Wallace  tree  form  [WiWt77l,  as  shown  in  Hgure  5-2.  The  equality 
checker  is  built  from  four-input  two-rail  TSC  checkers  in  an  Anderson  tree  [Ande7ll  Two 
TSC  checkers  are  used:  R/C  load  checker  and  PLA  control  checker. 

The  R/C  load  checker.  Figure  5-3,  operates  only  when  the  the  R/Cs  are  loaded.  When 
the  LOAD  control  signal  is  HIGH,  the  external  input  signals  (X)  m  loaded  into  both  R/C  1 
and  R/C  2,  and  the  check  bia  of  X  are  loaded  only  into  R/C  1.  The  check  bits  from  R/C  1 
are  checked  against  the  check  bits  generated  from  the  information  of  the  R/C  2.  The  loaded 
value  is  checked  to  insure  that  the  correct  value  has  been  loaded  for  subsequent  decrement. 
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Rgure  5-3.  Register/Counter  Load  Checker. 


The  PLA  coQtrol  checker.  Figure  5-4,  works  in  the  following  way.  The  check  bits  of 
the  input  control  signals  (I)  are  subtracted  from  the  modified  Berger  code  outputs  of  the 
PLA.  The  difference  should  be  the  codeword  of  the  12-bit  PLA  outputs  and  is  compared 
with  the  codeword  generated  from  the  PLA  output  control  signals.  The  other  two  PLA 
inputs,  CC  and  R«0,  are  compared  with  their  CC  external  input  and  from  R/C  2, 
respectively.  Two  inverted  control  signals,  PUSH  and  POP,  that  are  not  primary  outputs  of 
the  PLA,  are  checked  against  their  complements.  Furthermore,  the  output  of  the  R/C  load 
checker  is  input  into  the  PLA  checker.  Because  of  the  delay  time  of  the  various  inputs,  the 
checker  is  arranged  with  a  tninimum  amount  of  delay  time. 

To  have  a  TSC  checker,  the  checker  must  have  all  possible  input  vectors  to  exercise  all 
possible  faults  in  the  check-bit  generator.  The  PLA  control  checker  cannot  meet  this 
requirement  because  of  the  specified  PLA  outputs.  This  problem  can  be  solved  by  sharing 
the  check-bit  generator  between  the  two  checkers.  Because  there  is  no  restriction  on  the 
R/C  ^  possible  input  vectors  can  be  produced.  Because  of  the  different  checking  timing, 
the  R/C  load  checker  and  the  PLA  control  checker  can  easily  share  cme  check-bit  generator 
without  any  timing  penalty.  Since  a  check-bit  generator  requires  a  relatively  large  chip 
area,  the  sharing  scheme  provides  area  saving. 

5J.  Chip  Layout 

The  floor  plan  of  the  MCU  is  shown  in  Hgure  5-5.  The  designs  for  the  PLA  cells  and 
the  input/output  pads  are  described  in  [HoSeSOl. 

Because  of  the  CED  requirement,  there  are  two  layout  constraints.  The  first  con¬ 
straint  is  the  control  signal  fanout  lines.  Control  signals  to  duplicated  modules  must  be 
from  different  fanout  lines.  If  the  duplicated' modules  receive  control  signals  from  the 
same  fanout  lines,  faults  on  the  control  lines  could  cause  same  errors  in  both  of  the 
modules;  therefore,  these  errors  would  be  undetectable.  Control  signals  to  modules  that  are 
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not  duplicated,  such  as  the  stack  and  the  multiplexer,  are  fanout  lines  from  the  control  bus 
and  are  fed  back  to  the  control  bus.  Fanout  from  the  clock  and  power  bus  are  treated  the 
same  way  as  the  control  signal  fanout  by  which  they  are  fed  back  to  the  original  source. 

The  second  constraint  is  concerning  the  placement  of  checkers.  The  PLA  control 
checker  must  be  placed  at  the  end  of  the  control  bus,  after  all  the  fanouts  and  feedbacks. 
The  R/C  load  checker  must  be  placed  to  insure  at  lotst  one  of  the  two  R/C  copies  has  the 


correct  value. 


CHAPTER  6 


EVALUATION  AND  COMPARISON 


6.1.  Chip  Evaluatioii 

The  chip  measures  2788  x  2190  microns  where  lambda  «  2  microns  in  nMOS  tech¬ 
nology.  It  contains  4600  transistors  and  dissipates  an  estimated  0  watts  of  power  with  a 
5  volt  power  supply.  There  are  a  total  of  S2  pads:  29  input  pads  and  23  output  pads.  A 
plot  of  the  complete  chip  layout  appears  in  F^re  6-1. 

The  area  redundancy,  due  to  CED,  for  the  various  modules  is  shown  in  Table  6-1. 
The  PLA  requires  no  extra  AND  terms  for  the  check  bits,  and  the  three  extra  outputs 
account  for  only  0.7%  additional  chip  area.  The  redundancy  of  the  R/C  contains  one  copy 
of  the  R/C,  check-bit  buffers,  and  the  bus  to  the  R/C  load  checker.  The  redundancy  of  the 
UPC  includes  one  copy  of  the  UPC  and  both  copies  of  the  check-bit  generator.  The  redun¬ 
dancy  of  the  stack  is  in  the  storing  of  the  check  bits.  The  above  three  areas  also  include 
areas  due  to  control  fanout  lines.  The  control  bus  Both  the  R/C  load  checker  and  the  PLA 
control  checker  require  a  total  of  19%  extra  chip  area.  Because  the  constraint  on  the  control 
lines,  the  control  bus  must  be  routed  acroa  the  chip.  The  address  bus  requires  redundant 
area  for  the  check  bits.  The  addition  of  eight  input  pads  and  eight  output  pads  accounts  for 
14.8%  extra  area.  Because  of  the  placement  of  the  different  modules,  there  are  some  wasted 
areas  in  the  layout. 

For  timing  evaluation,  TSIM,  a  MOS  timing  simulator,  is  used.  Inputs  to  the  simulator 
are  transistor  ratios  and  load  capacitances  extracted  from  the  layout.  Based  on  simulation, 
the  MCU  can  be  operated  with  a  300  nanosecond  clock  cycle.  During  PHIl,  PLA  decodes 
the  instruction.  During  PHI2,  the  address  and  its  check  bits  are  generated.  Internal 


Table  6*1.  MCU  Area  Redundancy. 
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operations  start  during  PHI2,  and  some  are  carried  into  PHIl  of  the  next  clock  cycle.  The 
load  checker  begins  checking  during  PHI2  and  sends  its  2-bit  output  to  the  PLA  control 
checker  during  PHIl  of  the  next  clock  cycle.  The  PLA  control  checker  starts  checking  dur¬ 
ing  PHIl  of  the  next  clock  cycle,  and  the  status  signals  become  available  during  PHI2. 
Based  on  the  above  timing  operation,  the  critical  path  for  PHIl  is  the  decoding  of  the 
instruction  by  the  PLA.  The  critical  path  for  PHI2  is  the  generation  of  register-zero  (RH)) 
by  the  R/C  because  the  R-0  signal  is  needed  for  the  PLA  decoding  of  the  next  instruction. 
The  MCU  cycle  timing  waveforms  are  shown  in  Hgure  6-2. 

6,2.  Comparison 

Since  the  MCU  is  based  on  Wong’s  design,  a  comparison  is  made  between  the  two 
designs.  To  evaluate  this  design  approach  of  the  MCU,  the  MCU  is  also  compared  with  two 
other  sequencer  designs:  a  simplex  sequencer  and  a  single  chip  sequencer  with  duplicated 


control  units. 


Cycle  Timing  Waveforms. 


6^1.  Ccmpariaoii  to  Wong's  Des^ 

This-design  of  the  MCU  hss  been  improved  from  Wong’s  MCU  (WMCU)  both  in  chip 
size  and  in  timing  performance.  The  improvement  in  chip  size  results  from  of  several  fac¬ 
tors.  as  mentioned  in  Chapter  4.  A  different  set  of  design  rules  is  used,  and  lambda  is 
changed  from  2.5  microns  to  2  microns.  Moreover,  several  function  modules  are  eliminated. 
The  improvement  in  timing  performance  can  be  accounted  by  the  fact  that  in  our  design 
instructions  are  pipelined.  Because  of  the  changes  in  design  rules,  lambda  width,  and  design 
of  some  basic  cells,  the  delay  time  of  various  functional  modules  has  been  decreased  drasti¬ 
cally. 

6.2.2.  Comparison  to  a  Simple  and  a  Duplicated  MCU 

This  MCU  design  is  compared  with  two  other  sequencers:  a  simplex  sequencer  and  a 
single  chip  sequencer  with  duplicated  control  units.  The  simplex  sequencer  (SMCU)  has 
no  checker  and  the  information  bits  are  not  encoded.  The  duplicated  sequencer  (DMCU),  as 
shown  in  Figure  6-3,  has  the  same  number  of  input/output  pads  as  the  MCU;  however, 
internally  it  contains  duplicated  copies  of  the  SMCU  without  the  I/O  pads.  To  provide 
CED  on  the  DMCU,  all  input  signals  must  be  checked  against  their  check  bits;  therefore, 
two  input  checkers  are  needed  for  the  instruction  and  the  external  address  inputs.  Also, 
check  bits  must  be  generated  for  the  output  address,  and  an  output  checker  is  needed  for 
comparing  the  outputs  from  the  two  copies  of  the  SMCU. 

The  chip  size,  timing  performance,  and  power  dissipation  for  the  SMCU,  MCU.  and 
DMCU  are  shown  in  Table  6-2.  The  area  redundancy  for  the  MCU  and  DMCU  are  118% 
and  138%,  respectively.  The  high  redundancy  of  the  XKTU  can  be  accounted  for  by  the 
duplication  of  the  Register/Counter  and  the  UPC  Because  of  the  CED  constraint  on  the 
control  signal  lines,  a  significant  part  of  the  redundancy  is  due  to  routing.  The  DMCU  has 
redundancy  due  to  input  and  output  checkers,  extra  i/o  pads,  and  the  complete  duplication 
of  the  SMCU. 
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Hgure  6-3.  Duplicated  MCU  (DMCUX 
Table  6-2.  Comparison  Between  SMGU.  MCU,  and  DMCU. 
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4890  X  2980 

138 
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%AR  •  Area  Redundancy  (extra  area  /  the  area  of  the  SMCU) 

%PP  -  Performance  Penalty  (increase  in  clock  cycle  /  the  clock  cycle  of  the  SMCU) 

%PDP  -  Power  Dissipation  Penalty  (increase  in  power  dissipation  /  the  power  dissipation  of  the  SMCU) 

The  MCU  pays  no  performance  penalty  for  CED.  Error  detection  can  be  done  with  no 
interference  in  the  normal  operation.  On  the  other  hand,  the  DMCU  has  a  perfonnance 
penalty  of  17%.  The  penalty  is  caused  by  the  faa  that  check  bits  must  be  generated  after 
address  is  available. 

From  the  standpoint  of  area  redundancy  and  performance  penalty,  the  MCU  is  a 
slightly  better  des^  than  the  DMCU.  The  MCU  has  less  area  redundancy  than  the  DMCU 
and  has  no  performance  penalty  comparing  to  the  S^OJ.  However,  if  the  slight  improve¬ 
ments  in  area  redundancy  and  performance  are  not  crucial  to  the  chip  requirements,  the 
DMCU  would  be  a  better  choice  in  term  of  the  design  and  layout  tum-around  time.  The 
turn-around  time  of  the  DMCU  will  be  shorter  than  that  of  the  MCU  because  there  are  no 
special  layout  constraints  for  designing  the  SMCU  celL  Special  layout  constraints,  as  men¬ 
tioned  in  Section  SJ,  are  effective  only  when  placing  the  input  and  output  checker  after 
duplicating  the  SMCU  celL 
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CHAPTER  7 

CONCLUSIONS 


The  micToprogiain  control  unit  design  proposed  in  this  thesis  provides  a  valuable 
method  for  on-chip  concurrent  error  detection.  The  CED  MCU  requires  more  than  a  double 
the  amount  of  chip  area  than  that  for  a  simplex  MCU.  but  it  does  not  have  performance 
degradation.  For  CED.  the  MCU  is  a  more  favorable  design  than  a  duplicated  MCU  because 
the  MCU  has  smaller  area  redundancy  and  better  timing  performance;  however,  under  gen¬ 
eral  conditions,  the  DMCU  is  a  better  choice  because  it  offers  better  fault  coverage,  and  is 
easier  to  design  and  to  layout. 

We  plan  to  fabricate  this  layout.  Once  the  chip  is  available,  the  design  can  go  through 
hardware  evaluation  to  check  for  the  performance  of  the  design. 

There  are  many  improvements  that  can  be  made  on  the  MCU  design  especially  in 
terms  of  the  area  redundancy.  The  duplication  of  the  incrementer  and  the  decrementer 
requires  13%  and  23.9%  extra  areas,  respectively.  These  numbers  can  be  reduced  by  using 
totally  self-checking  incrementer  and  decrementer.  Area  redundancy  can  also  be  improved 
by  including  a  second  metal  layer  and  by  using  careful  layout  techniques  to  minimiTg  the 
amount  of  wasted  areas. 

Possible  future  research  concerns  inclusion  of  the  retry  capability  in  the  chip  so  that 
transient  errors  can  be  automaticaily  tolerated.  Our  design  of  an  MCU  would  have  less 
area  redundancy  because  the  duplicated  control  unit  must  be  an  MCU  with  its  own  retry 
capaoility  and  not  an  SMCU,  for  the  DMCU  to  provide  concurrent  error  detection.  Another 
possibility  for  future  research  is  the  addition  of  ROM  to  the  MCU  to  create  a  single  chip 
total  microprogram  controller.  The  .MCU  approach  may  be  more  favorable  than  the  DMCU 
approach  because  the  area  constraint  is  very  important  in  this  case. 


APPENDIX  A 


BASIC  CELLS 


In  the  following  few  pages,  basic  cells  fon 

[1]  Noninverting  and  inverting  supper  buffers. 

[2]  4-input  totally  self-checking  checker. 

[3]  Adders  and  subtractors. 

[4]  Register/Counter. 

[5]  Microprogram  counter. 

[6]  Stack. 

are  shown  in  mixed  notation  or  in  block  diagram. 
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Figure  A-4.  Register/Counter  Ceil  (RCCELL). 
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APPENDIX  B 


INPUT  AND  OUTPUT  PAD  ASSIGNMENTS 

The  are  a  total  of  S2  input/output  pads,  and  the  pad  assignments  are  shown  in  Table 
B*l.  Each  pad  is  assignmented  with  a  number  start  in  a  clockwise  motion  from  the  bottom 
left  comer  to  the  bottom  right  of  the  chip^  as  shown  in  Hgure  6-1. 
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